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^— H Abstract 
> 

We establish that the min-sum message-passing algorithm and its 
lO asynchronous variants converge for a large class of unconstrained con- 

vex optimization problems. 

O 1 Introduction 

Consider an optimization problem of the form 

, , minimize F{x) ^ T,cec fci^c) 

^ ' subject to X e X^. 

^ Here, the vector of decision variables x is indexed by a finite set V = 

{1, . . . , n}. Each decision variable takes values in the set X. The set C 
is a collection of subsets of the index set V. This collection describes 
an additive decomposition of the objective function. We associate with 
each set C G C a component function (or factorjjc '■ X^ which 
takes values as a function of those componentqj xc of the vector x 
identified by the elements of C . 



^Given a vector x £ and a subset A C V , we use the notation xa = (xi, i £ A) £ 
A"* for the vector of components of x specified by the set A. 
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The min-sum algorithm is a method for optimization problems of 
the form ( 1.1 1. It is one of a class of methods know as message-passing 
algorithms. These algorithms have been the subject of considerable 
research recently across a number of fields, including communications, 
artificial intelligence, statistical physics, and theoretical computer sci- 
ence. Interest in message-passing algorithms has been sparked by their 
success in solving certain classes of NP-hard combinatorial optimiza- 
tion problems, such as the decoding of low-density parity-check codes 
and turbo codes (e.g., [U ^ or the solution of certain classes of 
satisfiabiUty problems (e.g., [HIS]). 

Despite their successes, message-passing algorithms remain poorly 
understood. For example, conditions for convergence and accurate 
resulting solutions are not well characterized. 

In this paper, we consider cases where X = M., and the optimiza- 
tion problem is continuous. One such case that has been examined 
previously in the literature is where the objective is pairwise separable 
(i.e., |C| < 2, for all C G C) and the component functions {/c(')} 
are quadratic and convex. Here, the min-sum algorithm is known to 
compute the optimal solution when it converges [6l|7l[8], and sufficient 
conditions for convergence identify a broad class of problems [21 HD] ■ 

Our main contribution is the analysis of cases where the functions 
are convex but not necessarily quadratic. We establish that the min- 
sum algorithm and its asynchronous variants converge for a large class 
of such problems. The main sufficient condition is that of scaled diago- 
nal dominance. This condition is similar to known sufficient conditions 
for asynchronous convergence of other decentralized optimization al- 
gorithms, such as coordinate descent and gradient descent. 

Analysis of the convex case has been an open challenge and its 
resolution advances the state of understanding in the growing literature 
on message-passing algorithms. Further, it builds a bridge between 
this emerging research area and the better established fields of convex 
analysis and optimization. 

This paper is organized as follows. The next section studies the 
min-sum algorithm in the context of pairwise separable convex pro- 
grams, establishing convergence for a broad class of such problems. 
Section [3] extends this result to more general separable convex pro- 
grams, where each factor can be a function of more than two variables. 
In Section [4] we discuss how our convergence results hold even with a 
totally asynchronous model of computation. When applied to a con- 
tinuous optimization problem, messages computed and stored by the 
min-sum algorithm are functions over continuous domains. Except in 
very special cases, this is not feasible for digital computers, and in 
Section |5] we discuss implementable approaches to approximating the 
behavior of the min-sum algorithm. We close by discussing possible 
extensions and open issues in Section [6j 
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2 Pairwise Separable Convex Programs 



Consider first the case of pairwise separable programs. These are pro- 



grams of the form ( 1.1 1, where |C| < 2, for all C e C. In this case, we 
can define an undirected graph (V, E) based on the objective function. 
This graph has a vertex set V corresponding to the decision variables, 
and an edge set E defined by the pairwise factors, 

E = {CeC: |C|=2}. 

Definition 1. (Pairwise Separable Convex Program) A pairwise 
separable convex program is an optimization problem of the form 

(2 1) i^^™™ize F{x) = J2,^^y fi{xi) + E(i,j)Gis hixz,Xj) 
^ ■ ' subject to a; e M^, 

where the factors {fi{-)} are strictly convex, coercive, and twice con- 
tinuously differentiable, the factors {fij{-, •)} are convex and twice con- 
tinuously differentiable, and 



A 

M = min inf —^F(x) > 0. 
lev xeK^ oxf 



Under this definition, the objective function F(x) is strictly convex 
and coercive. Hence, we can define x* G to be the unique optimal 
solution. 



2.1 The Min-Sum Algorithm 

The min-sum algorithm attempts to minimize the objective function 
F(-) by an iterative, message-passing procedure. For each vertex i £V, 
denote the set of neighbors of i in the graph by 

N{t) = {jeV: ii,j)eE}. 

Denote the set of edges with direction distinguished by 

E^{{t,j)eVxV : ieNU)}. 

At time t, each vertex i keeps track of a "message" from each neighbor 
u £ N(i). This message takes the form of a function J^l^^ : M^M. 
These incoming messages are combined to compute new outgoing mes- 
sages for each neighbor. The message from vertex i to vertex 
j £ N{i) evolves according to 
(2.2) 

J^pix,) = mm I /,(y,) -I- f.,(M^,x,) + ^ jiUv^)] + ■ 
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Here, K^J^p represents an arbitrary offset term tliat varies from mes- 
sage to message. Only the relative values of the function jf^^i-) 
matter, so the choice of t^f^^ does not influence relevant information. 

At each time i > 0, a local objective function b'f\-) is defined for 
each variable Xi by 

(2.3) }lp{xi)^h{x^+ J2 JuU^^)- 

ueN{i) 

An estimate x^*'' can be obtained for the optimal value of the variable 
Xi by minimizing the local objective function: 

(2.4) xf^ = argmin&f -'(j/j). 

Vi 

The min-sum algorithm requires an initial set of messages { jf^^j(-)} 
at time t ~ 0. We make the following assumption regarding these 
messages: 

Assumption 1. (Min-Sum Initialization) Assume that the initial 
messages {Ji^j{-)} are chosen to he twice continuously differ entiahle 
and so that, for each message there exists some Zi^j G K with 

,2 a2 

(2.5) ^ V X, e M. 
Assumption [l] guarantees that the messages at time t = are con- 



vex functions. Examining the update equation (2.2 ), it is clear that, by 
induction, this implies that all future messages are also convex func- 
tions. Similarly, since the functions {/i(-)} &re strictly convex and 
coercive, and the functions {/ij(',')} convex, it follows that the 



optimization problem in the update equation (2.2 1 is well-defined and 



uniquelyl minimized. Finally, each local objective function 6|*'(-) must 
strictly convex and coercive, and hence each estimate xf ^ is uniquely 



defined by ( [2.4^ 

Assumption [T] also requires that the initial messages be sufficiently 



convex, in the sense of (2.5 1. As we will shortly demonstrate, this 
will be an important condition for our convergence results. For the 
moment, however, note that it is easy to select a set of initial messages 
satisfying Assumption [l] For example, one might choose 

J^%{xJ)^f^M^J)■ 
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2.2 Convergence 

Our goal is to understand conditions under which the min-sum algo- 
rithm converges to the optimal solution x* , i.e. 

lim x^*-* = X* . 

t — ^oo 

Consider the following diagonal dominance condition: 

Definition 2. (Scaled Diagonal Dominance) An objective function 
F : M^^M is {X^w) -scaled diagonally dominant if X is a scalar with 
< A < 1 and w S is a vector with w > 0, so that for each i G V 
and all x £ MX , 



j<£V\l 



2 



d 

dxidxj 



,2 



d' 



Our main convergence result is as follows: 

Theorem 1. Consider a pairwise separable convex program with an 
objective function that is {X,w)-scaled diagonally dominant. Assume 
that the min-sum algorithm is initialized in accordance with Assump- 
tion^ Define the constant 

1 max„w„ 
li — — — : . 

M miUtj Wu 

Then, the iterates of the min-sum algorithm satisfy 

{ib.v)eE 

Hence, 



lim x''*^ = X* . 

t — ^oo 

Proof. The proof for Theorem [T] will be provided in Section |2.4[ □ 

We can compare Theorem [l] to existing results on min-sum con- 
vergence in the case of where the objective function F(-) is quadratic. 
Rusmevichientong and Van Roy [7] developed abstract conditions for 
convergence, but these conditions are difficult to verify in practical in- 
stances. Convergence has also been established in special cases arising 
in certain applications [TTJ [T^] . 

More closely related to our current work, Weiss and Freeman [B] 
established convergence when the factors {fi{-), fij{-, •)} ^re quadratic, 
the single-variable factors ^tre strictly convex, and the pairwise 

factors {fij{-, ■)} E^re convex and diagonally dominated, i.e. 



& 



,2 



fij {Xi, Xj) 



dxdx--"^^ 



2 



d 
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The results of Malioutov, et al. [10^ and our prior work |S] remove the 
diagonal dominance assumption. However, all of these results are spe- 
cial cases of Theoremjl] In particular, if the a quadratic objective func- 
tion F(-) decomposes into pairwise factors so that the single-variable 
factors are quadratic and strictly convex, and the pairwise factors are 
quadratic convex, then F{-) must be scaled diagonally dominant. This 
can be established as a consequence of the Perron-Frobenius theorem 
[10] . Finally, as we will see in Section [s] Theorem [T| also generalizes 
beyond pairwise decompositions. 

2.3 The Computation Tree 

In order to prove Theorem [T] we first introduce the notion of the com- 
putation tree. This is a useful device in the analysis of message-passing 
algorithms, originally introduced by Wiberg ^13) . Given a vertex r € V 
and a time t, the computation tree defines an optimization problem 
that is constructed by "unrolling" all the optimizations involved in the 
computation of the min-sum estimate ccr*'* . 

Formally, the computation tree is a graph T = {V,£) where each 
vertex i € V is in labeled by a vertex i £ V in the original graph, 
through a mapping a : V^V . This mapping is required to preserve 
the edge structure of the graph, so that if G £, then {ai,aj) € 
E. Given a vertex z S V, we will abuse notation and refer to the 
corresponding vertex ai € V in the original graph simply by i. 

Fixing a vertex r £ V and a time t, the computation tree rooted 
at r and of depth t is defined in an iterative fashion. Initially, the tree 
consists of a root single vertex corresponding to r. At each subsequent 
step, the leaves in the computation tree are examined. Given a leaf 
i with a parent j, a vertex u and an edge {u,i) are added to the 
computation tree corresponding to each neighbor of i excluding j in 
the original graph. This process is repeated for t steps. An example of 
the resulting graph is illustrated in Figure [T] 

Given the graph T — (V,£), and the correspondence mapping cr, 
define a decision variable Xi for each vertex i G V. Define a pairwise 
separable objective function Fr : M^^M, by considering factors of 
the form: 

1. For each i e V, add a single- variable factor fi{xi) by setting 

2. For each G V, add a pairwise factor fij{xi,Xj) by setting 

3. For each i £ V that is a leaf vertex with parent j, add a single- 
variable factor JuXcri{xi), for each neighbor u £ N{iJi) \ aj of i 
in the original graph, excluding j. 
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Figure 1: A graph and the corresponding computation tree, rooted at vertex 
1 and of depth t = 3. The vertices in the computation tree are labeled 
according to the corresponding vertex in the original graph. 

Now, let X be the optimal solution to the minimization of the com- 
putation tree objective Fr{-)- By inductively examining the operation 
of the min-sum algorithm, it is easy to establish that the component Xr 
of this solution at the root of the tree is precisely the min-sum estimate 
xi*\ 

The following lemma establishes that the computation tree inherits 
the scaled diagonal dominance property from the original objective 
function. 

Lemma 1. Consider a pairwise separable convex program with an 
objective function that is {X,w)-scaled diagonally dominant. Assume 
that the min-sum algorithm is initialized in accordance with Assump- 
tion^ and let T — (V,£) be a computation tree associated with this 
program. Then, the computation tree objective function Fq-{-) is also 
(X^w) -scaled diagonally dominant. 

Proof. Given a vertex i G V, let N^{i) be the neighborhood in the 
computation tree, and let N{i) be the neighborhood of the correspond- 
ing vertex in the original graph. If i G V is an interior vertex of the 
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computation tree, then 



dxidx. 



-Ft{x) 



92 



dxidx, 



fiu {Xi , Xu ) 



dx 



32 

oxf 



where the inequahty follows from the scaled diagonal dominance of the 
original objective function F(-). 

Similarly, if z is a leaf vertex with parent j, 



uev\4 



92 



dxjdx, 



-Ft{x) 



92 



92 



dxidx. 



' fij{Xi, Xj) 



u&N{i)\j 



92 



dxidxu 



fiu{Xi^ — 



dx"} 



u^N{i)\] 



dx^ 



^2 



9^2 



ul£N{i)\j 



dx'j 



92 



Here, the second inequality follows from the scaled diagonal dominance 
of the original objective function F{-), and the third inequality follows 
from Assumption [l] □ 



2.4 Proof of Theorem [T] 

In order to prove Theoremjl} we will study the evolution of the min-sum 
algorithm under a set of linear perturbations. Consider an arbitrary 
vector p G with one component Pi^j for each i E V and j G 
N{i). Given an arbitrary vector p, define {JiJlji-,p)} to be the set of 
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messages that evolve according to 
(2.6) 



\ ueN{i)\j 

Similarly, define {b[*\-,p)} and to be the resulting local ob- 

jective functions and optimal value estimates under this perturbation: 

ueN{i) 

xf\p) = argmin6f^(yi,p). 

Vi 

The following simple lemma gives a particular choice of p for which 
the min-sun algorithm yields the optimal solution at every time. 

Lemma 2. Define the vector p* e by setting, for each i V and 

3 e N{i), ^ ^ 

Then, at every time t > 0, 

(2.7) ^/A(-vP*)=^M-l-*3)^ 
and x^p {p* ) = X*. 

Proof. Note that the first order optimality conditions for F{x) at x* 
imply that, for each j , 



i£N(j) 



If (2.7 1 holds at time <, this is exactly the first order optimality condi- 
tion for the minimization of 6^*''(-,p*), thus x''p{p*) — x*. 

Clearly (2.7 1 holds at time t — Q. Assume it holds at time i > 0. 
Then, when Xj = x*, the minimizing value of yi in (2.7 1 is x*. Hence, 
( |2J| holds at time t + l. □ 

Next, we will bound the sensitivity of the estimate xf \p) to the 
choice of p. The main technique employed here is analysis of the com- 
putation tree described in Section [2.3[ In particular, the perturbation 
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p impacts the computation tree only through the leaf vertices at depth 
t. The scaled diagonal dominance property of the computation tree, 
provided by Lemma [l] can then be used to guarantee that this impact 
is diminishing in t. 



Lemma 3. We have, for all p e 



e V, {u, v) e E, and t>0, 



d 

dpu^v 



4Hp) 



< K- 



1 - A' 



Proof. Fix r £ V, and let T = (V, £) be the computation tree rooted 
at r after t time steps. Let Ft{x,p) be the objective value of this 
computation tree, and let 

i(p) = argmin F-r{x,p), 



so that 

By the first order optimality conditions, for any j G V, 

d 



Fr{5:{p),p)^Q. 



If j is an interior vertex of T, this becomes 
(2.8) 



dx 



fj J ip))+ ' (p))^^- 



i£N(]) 

If j is a leaf with parent u, we have 
d d 



E 

ieN{j)\u 



Now, fixed some directed edge (a, 6), and differentiate (2.8l~(2.9l with 
respect to Pa^t- We have, for an interior vertex j, 

E ^ f^j * (p) ' ^3 (p) ) q:^^ (p) 



, , dxj 

a2 



E 



d 

fij {xi ip) , Xj ip) ) x^ ip) , 

b 



dxidxj''^'''''"'^'"''"'"''^^" dpa 
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and for a leaf vertex j with parent u, 



2 fuj {Xu {P) , Xj (p) ) J- Xj (p) 



92 



d 

OPa^b 



ieN(j)\u \ j 



We can write this system of equations in matrix form, as 
(2.10) Tv"^^ + h''^^ = 0. 

Here, v""^^ G is a vector with components 



^ dpa^b 



The vector /i"^* g has components 

i,a— ^6 TT 

'^j ^{j is a leaf vertex of type a with a parent of type 6} * 

The symmetric matrix T G K^^^ has components as follows: 

1. If J is an interior vertex, 

r j j = ^ fj (xj (p) ) + ^ ^ (Ji (p) , Xj {p) ) . 

2. If j is an interior vertex and i G 

3. If j is a leaf vertex with parent u, 

2 



^2 

Tjj = + Q^fuj{xu{p),Xj{p)) 

iGN{j)\u i 
= Q^^Q^. fuj {Xu {P) , Xj (P) ) . 
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4. All other entries of T are zero. 

Note that T ~ 'V'^F-r{x{p),p). Then, Lemma [l] implies that 

(2.11) ^ «;,|r,, | < Aii;,r,,. 

^ev\j 

Define, for vectors x G M^, the weighted sup-norm 

= m&x\xj\/wj. 

For a linear operator A : M^^M^, The corresponding induced opera- 
tor norm is given by 



Define the matrices 



Z? = diag(r), 
R = I - D-^T. 



Then, ( |2.1l| implies that 

||i?||- <A<1. 
Hence, the matrix / — i? = _D^^ris invertible, and 



s=0 



Examining the linear equation (2.101, we have 



We are interested in bounding the value of the component w"^^ (recall 
that w^^'' — dxi^\p) / dpa^b) ■ Hence, we have 



Since /i"^'' is zero on interior vertices, and any leaf vertex is distance 
t from the root r, we have 
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Thus, 



Then, 



s=t 
oo 



< 



D-'h' 



1 ua^b\ 



< 



s=t 

A* 
1~ 



A* / 9^ 

< -max sup Wi—-^F{x) 

A* 1 

< M max — 

1 — A i€V Wi 



□ 



The following lemma combines the results from Lemmas |2] and |3] 
Theorem [l] follows by taking p = 0. 

Lemma 4. Given an arbitrary vector p € ffi^, 



Proof. For any j € V, define 



We have, from Lemma |2] 



xfip) - X* = xf>{p) - x'f'ip*) = gf>{l) - gf>{G) 
By the mean value theorem and Lemma |3] 



,(*)/ 



\x^^{p)-x*\ 



< sup 



< 



sup 



ee[o,i] 



{u,v)£E 



^ xf\9p+{i-e)p*) 



dpu^v 



<Ky^ Y1 \PU^V-Pu^y\ 



{u,v)eE 



□ 
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3 General Separable Convex Programs 

In this section wc will consider convergence of the min-sum algorithm 
for more general separable convex programs. In particular, consider a 
vector of real- valued decision variables x G MY , indexed by a finite set 
V, and a hypergraph {V,C), where the set C is a collection of subsets 
(or "hyperedges" ) of the vertex set V. 

Definition 3. (General Separable Convex Program) A general 
separable convex program is an optimization problem of the form 

minimize F{x) = J2i(^v M^i) + T,ceC fc{xc) 
^ ' subject to a;eM^, 

where the factors {/i(-)} are strictly convex, coercive, and twice con- 
tinuously differentiable, the factors {/c(0} '^'^^ convex and twice con- 
tinuously differentiable, and 

A 9^ 
M = min inf ^Fix) > 0. 
iev xeRV dxf 

In this setting, the min-sum algorithm operates by passing mes- 
sages between vertices and hyperedges. In particular, denote the set 
of neighbor hyperedges to a vertex i ^ V hy 

Nf{i} = {C e C : i&C}, 

The min-sum update equations take the form 

(3.2) 

&{xi) = fi{xi)+ Yl 4LM) + i^'+'^ 



c ' 

C'6JVj-(i)\C 



Jc^i{xi)=mhimm,e fc{xi,yc\i)+ ^ 4*ic 



At+i) 



Local objective functions and estimates of the optimal solution are 
defined by 

bt\xi) = fi{x.)+ JcUx^), 
ceN,{i) 

xf^ = argmin6f^(yi). 

Vi 

We will make the following assumption on the initial messages: 

Assumption 2. (Min-Sum Initialization) Assume that the initial 
messages {JqI^A-)} are chosen to be twice continuously differentiable 



14 



and so that, for each message jjjl^ji-), there exists some zc^j G M'^^* 
with 

^T-iJclji^j) > -jrjifcixj^zc^j), V xj e K. 
ax. ux. 

Then, we have the following analog of Theorem [T] 

Theorem 2. Consider a general separable convex program. Assume 
that either: 

(i) The objective function F(x) is scaled diagonally dominant, and 
each pair of vertices i,] participate in at most one common 
factor. That is, 

\{C&C : CC}| < 1, yi,j£V. 

(a) The factors {/c(')} ^'^^ individually scaled diagonally dominant. 



m the sense that exists a scalar A G (0, 1) and a vector w G 



with w > 0, so that for all C E C , i G C, and xc G M*^, 



52 



dxidxj 



fc{xc) 



2 



d 



Assume that the min-sum algorithm is initialized in accordance with 
Assumption^ Define the constant 

1 max„w„ 

K 



M niinu Wu 

Then, the iterates of the min-sum algorithm satisfy 



Hence, 

lim x^*^ = X* 



t — *oo 



Proof. This result can be proved using the same method as Theo- 
rem[T] The main modification required is the development of a suitable 
analog of Lemma [T] In the general case, scaled diagonal dominance 
of the computation tree does not follow from scaled diagonal domi- 
nance of the objective function F{x). However, it is easy to verify 
that either of the hypotheses (i) or (ii) imply scaled diagonal domi- 
nance of the computation tree. The balance of the proof proceeds as 
in Section [2^41 □ 
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4 Asynchronous Convergence 



The convergence results of Theorems [T| and |2] assumed a synchronous 
model of computation. That is, each message is updated at every 



time step in parallel. The min-sum update equations (2.2 1 and (3.2 1 
are naturally decentralized, however. If we consider the application 
of the min-sum algorithm in distributed contexts, it is necessary to 
consider convergence under an asynchronous model of computation. 
In this section, we will establish that Theorems [l] and [2] extend to an 
asynchronous setting. 

Without loss of generality, consider the pairwise case. Assume that 
there is a processor associated with each vertex i in the graph, and that 
this processor is responsible for computing the message Ji—,j{-), for 
each neighbor j of vertex i. Each processor occasionally communicates 
its messages to neighboring processors, and occasionally computes new 
messages based on the most recent messages it has received. Define 
the to be the set of times at which new messages are computed. 
Define < Tj^i{t) < i to be the last time the processor at vertex j 
communicated to the processor at vertex i. Then, the messages evolve 
according to 



\ ueN{i)\j 



if t g T\ and 



otherwise. 

We will make the following assumption [14] : 

Assumption 3. (Total Asynchronism) Assume that each set T* is 
infinite, and that if {i^} is a sequence in tending to infinity, then 

lim n^jitk) = oo, 

k — >OQ 

for each neighbor j £ N(i). 

Total asynchronism is a very mild assumption. It guarantees that 
each component is updated infinitely often, and that processors eventu- 
ally communicate with neighboring processors. It allows for arbitrary 
delays in communication, and even the out-of-order arrival of messages 
between processors. 

Theorem[l]can be extended to the totally asynchronous setting. To 
see this, note that we can repeat the construction of the computation 
tree in Section |2.3[ As in the synchronous case, the initial messages 
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only impact the leaves of computation tree. The total asynchronism 
assumption guarantees that these leaves are, eventually, arbitrarily 
far away from the root of the computation tree. The arguments in 
Lemma [3] then imply that the optimal value at the root of the compu- 
tation tree is insensitive to the choice of initial messages. Convergence 



follows, as in Section 2.4 



The scaled diagonal dominance requirement of our convergence re- 
sult is similar to conditions required for the totally asynchronous con- 
vergence of other optimization algorithms. Consider, for example, a 
decentralized coordinate descent algorithm. Here, the processor asso- 
ciated with vertex i maintains an estimate of the ith component of 
the optimal solution at time t. These estimates are updated according 
to 

xf+^)=argmin/,(2;.)+ E /o.(4"-'^*^^ 2/0, 

if i e r% and a;-*"''^^ = xf\ otherwise. 

Similarly, consider a decentralized gradient method, where 



u<£N{i) 

if i G T*, and xf~^^^ — x[*\ otherwise, for some small positive step 
size a. These methods are not guaranteed to converge for arbitrary 
pairwise separable convex optimization problems. Typically, some sort 
of diagonal dominance condition is needed |14j . 



5 Implementation 

The convergence theory we have presented elucidates properties of the 
min-sum algorithm and builds a bridge to the more established areas 
of convex analysis and optimization. However, except in very spe- 
cial cases, the algorithm as we have formulated it can not be imple- 
mented on a digital computer because the messages that are computed 
and stored are functions over continuous domains. In this section, we 
present two variations that can be implemented to approximate be- 
havior of the min-sum algorithm. For simplicity, we restrict attention 
to the case of synchronous min-sum for pairwise separable convex pro- 
grams. 

Our first approach approximates messages using quadratic func- 
tions and can be viewed as a hybrid between the min-sum algorithm 
and Newton's method. It is easy to show that, if the single-variable 
factors {fi{-)} are positive definite quadratics and the pairwise factors 
{/ij (•,•)} are positive semidefinite quadratics, then min-sum updates 
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map quadratic messages to quadratic messages. The algorithm we pro- 
pose here maintains a running estimate i*^*^ of the optimal solution, 
and at each time approximates each factor by a second-order Taylor 
expansion. In particular, let be the second-order Taylor expan- 

sion of /;(•) around and let fij\-,-) be the second-order Taylor 

expansion of fij{-,-) around (if ^ i^"'''). Quadratic messages are up- 
dated according to 
(5.1) 



\ ueN{z)\j J 

where running estimates of the optimal solution are generated accord- 
ing to 

(5.2) ir^'=argmin(/r^)(,,:)+ E iv^) 
^' \ ueN(i) 



Note that the message update equation (5.11 takes the form of a Ricatti 



equation for a scalar system, which can be carried out efficiently. Fur- 



ther, each optimization problem (5.2 1 is a scalar unconstrained convex 
quadratic program. 

A second approach makes use of a piecewise-linear approximation 
to each message. Let us assume knowledge that the optimal solution x* 
is in a closed bounded set [—B,B]"-. Let S = {xi, . . . ,Xm} C [—B,B], 
with — i? = xi < • • • < Xm = i?, be a set of points where the linear 
pieces begin and end. Our approach applies the min-sum update equa- 
tion to compute values at these points. Then, an approximation to the 
min-sum message is constructed via linear interpolation between con- 
secutive points or extrapolation beyond the end points. In particular, 
the algorithm takes the form 



(5.3) 



41+/Hx,)= mm [My,) + f,,{y,,x,)+ ^ ^uUv^) 

Vi^\ — B,B\\ ^ — ' 

' ^ \ ueN(z)\j 
,(t+i) 



for Xj € 5, where 
(5.4) 

jit), s {xk+i-x,)jl%{xk+i) + ix,-Xk)ji*liixk) 

■■j,,—,j\Xi) — max - - , 

l<fc<m-l Xk+l — Xk 

for all Xi e M. As opposed to the case of quadratic approximations, 
where each message is parameterized by two numerical values, the 
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number of parameters for each piecewise linear message grows with m. 
Hence, we anticipate that for fine-grain approximations, our second 
approach is Hkely to require greater computational resources. On the 
other hand, piecewise linear approximations may extend more effec- 
tively to non-convex problems, since non-convex messages are unlikely 
to be well-approximated by convex quadratic functions. 

6 Open Issues 

There are many open questions in the theory of message passing al- 
gorithms. They fuel a growing research community that cuts across 
communications, artificial intelligence, statistical physics, theoretical 
computer science, and operations research. This paper has focused on 
application of the min-sum message passing algorithm to convex pro- 
grams, and even in this context a number of interesting issues remain 
unresolved. 

Our proof technique establishes convergence under total asynchro- 
nism assuming a scaled diagonal dominance condition. With such a 
flexible model of asynchronous computation, convergence results for 
gradient descent and coordinate descent also require similar diagonal 
dominance assumptions. On the other hand, for the partially asyn- 
chronous setting, where communication delays and times between suc- 
cessive updates are bounded, such assumptions are no longer required 
to guarantee convergence of these two algorithms. It would be in- 
teresting to see whether convergence of the min-sum algorithm under 
partial asynchronism can be established in the absence of scaled diag- 
onal dominance. 

Another direction will be to assess practical value of the min-sum 
algorithm for convex optimization problems. This calls for theoretical 
or empirical analysis of convergence and convergence times for im- 
plementable variants as those proposed in the previous section. Some 
convergence time results for a special case reported in [TT] may provide 
a starting point. Our expectation is that for most relevant centralized 
optimization problems, the min-sum algorithm will be more efficient 
than gradient descent or coordinate descent but fall short of Newton's 
method. On the other hand, Newton's method does not decentralize 
gracefully, so in applications that call for decentralized solution, the 
min-sum algorithm may prove to be useful. 

Finally, it would be interesting to explore whether ideas from this 
paper can be helpful in analyzing behavior of the min-sum algorithm 
for non-convex programs. It is encouraging that convex optimization 
theory has more broadly proved to be useful in designing and analyzing 
approximation methods for non-convex programs. 
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